Search CORE

14 research outputs found

Datalog with Negation and Monotonicity

Author: Ketsman Bas
Koch Christoph
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Database Theory (ICDT 2020)
Publication date: 01/01/2020
Field of study

Positive Datalog has several nice properties that are lost when the language is extended with negation. One example is that fixpoints of positive Datalog programs are robust w.r.t. the order in which facts are inserted, which facilitates efficient evaluation of such programs in distributed environments. A natural question to ask, given a (stratified) Datalog program with negation, is whether an equivalent positive Datalog program exists. In this context, it is known that positive Datalog can express only a strict subset of the monotone queries, yet the exact relationship between the positive and monotone fragments of semi-positive and stratified Datalog was previously left open. In this paper, we complete the picture by showing that monotone queries expressible in semi-positive Datalog exist which are not expressible in positive Datalog. To provide additional insight into this gap, we also characterize a large class of semi-positive Datalog programs for which the dichotomy `monotone if and only if rewritable to positive Datalog\u27 holds. Finally, we give best-effort techniques to reduce the amount of negation that is exhibited by a program, even if the program is not monotone

Dagstuhl Research Online Publication Server

Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation

Author: Geck Gaetano
Ketsman Bas
Neven Frank
Schwentick Thomas
Publication venue
Publication date: 19/12/2015
Field of study

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Optimal Broadcasting Strategies for Conjunctive Queries over Distributed Data

Author: Ketsman Bas
Neven Frank
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Conference on Database Theory (ICDT 2015)
Publication date: 01/01/2015
Field of study

In a distributed context where data is dispersed over many computing nodes, monotone queries can be evaluated in an eventually consistent and coordination-free manner through a simple but naive broadcasting strategy which makes all data available on every computing node. In this paper, we investigate more economical broadcasting strategies for full conjunctive queries without self-joins that only transmit a part of the local data necessary to evaluate the query at hand. We consider oblivious broadcasting strategies which determine which local facts to broadcast independent of the data at other computing nodes. We introduce the notion of broadcast dependency set (BDS) as a sound and complete formalism to represent locally optimal oblivious broadcasting functions. We provide algorithms to construct a BDS for a given conjunctive query and study the complexity of various decision problems related to these algorithms

CiteSeerX

Dagstuhl Research Online Publication Server

Distribution Policies for Datalog

Author: Albarghouthi Aws
Ketsman Bas
Koutris Paraschos
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Conference on Database Theory (ICDT 2018)
Publication date: 01/01/2018
Field of study

Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds

Dagstuhl Research Online Publication Server

A Near-Optimal Parallel Algorithm for Joining Binary Relations

Author: Ketsman Bas
Suciu Dan
Tao Yufei
Publication venue
Publication date: 19/10/2021
Field of study

We present a constant-round algorithm in the massively parallel computation (MPC) model for evaluating a natural join where every input relation has two attributes. Our algorithm achieves a load of

\tilde{O}(m/p^{1/\rho})

where

m

is the total size of the input relations,

p

is the number of machines,

\rho

is the join's fractional edge covering number, and

\tilde{O}(.)

hides a polylogarithmic factor. The load matches a known lower bound up to a polylogarithmic factor. At the core of the proposed algorithm is a new theorem (which we name {\em the isolated cartesian product theorem}) that provides fresh insight into the problem's mathematical structure. Our result implies that the {\em subgraph enumeration problem}, where the goal is to report all the occurrences of a constant-sized subgraph pattern, can be settled optimally (up to a polylogarithmic factor) in the MPC model.Comment: Short versions of this article appeared in PODS'17 and ICDT'20. The article is under submission to a journal. The red sentences are highlighted for the journal's reviewer

arXiv.org e-Print Archive

Episciences.org

Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics

Author: Ketsman Bas
Neven Frank
Vandevoort Brecht
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Conference on Database Theory (ICDT 2018)
Publication date: 01/01/2018
Field of study

Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query. This property is referred to as parallel-correctness. Another key problem is to detect whether the data reshuffle step can be avoided when evaluating subsequent queries. The latter problem is referred to as transfer of parallel-correctness. This paper extends the study of parallel-correctness and transfer of parallel-correctness of conjunctive queries to incorporate bag semantics. We provide semantical characterizations for both problems, obtain complexity bounds and discuss the relationship with their set semantics counterparts. Finally, we revisit both problems under a modified distribution model that takes advantage of a linear order on compute nodes and obtain tight complexity bounds

Dagstuhl Research Online Publication Server

Robustness Against Read Committed for Transaction Templates with Functional Constraints

Author: Ketsman Bas
Koch Christoph
Neven Frank
Vandevoort Brecht
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th International Conference on Database Theory (ICDT 2022)
Publication date: 01/01/2022
Field of study

The popular isolation level Multiversion Read Committed (RC) trades some of the strong guarantees of serializability for increased transaction throughput. Sometimes, transaction workloads can be safely executed under RC obtaining serializability at the lower cost of RC. Such workloads are said to be robust against RC. Previous work has yielded a tractable procedure for deciding robustness against RC for workloads generated by transaction programs modeled as transaction templates. An important insight of that work is that, by more accurately modeling transaction programs, we are able to recognize larger sets of workloads as robust. In this work, we increase the modeling power of transaction templates by extending them with functional constraints, which are useful for capturing data dependencies like foreign keys. We show that the incorporation of functional constraints can identify more workloads as robust that otherwise would not be. Even though we establish that the robustness problem becomes undecidable in its most general form, we show that various restrictions on functional constraints lead to decidable and even tractable fragments that can be used to model and test for robustness against RC for realistic scenarios

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Datalog Queries Distributing over Components

Author: Ameloot Tom J.
Ketsman Bas
Neven Frank
Zinn Daniel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Conference on Database Theory (ICDT 2015)
Publication date: 01/01/2015
Field of study

We investigate the class D of queries that distribute over components. These are the queries that can be evaluated by taking the union of the query results over the connected components of the database instance. We show that it is undecidable whether a (positive) Datalog program distributes over components. Additionally, we show that connected Datalog ¬ (the fragment of Datalog ¬ where all rules are connected) provides an effective syntax for Datalog ¬ programs that distribute over components under the stratified as well as under the well-founded semantics. As a corollary, we obtain a simple proof for one of the main results in previous work [19], namely, that the classic win-move query is in F2 (a particular class of coordination-free queries)

CiteSeerX

Dagstuhl Research Online Publication Server

Distributed Subweb Specifications for Traversing the Web

Author: Aamer Heba
Bogaerts Bart
Ketsman Bas
Taelman Ruben
Verborgh Ruben
Zeboudj Younes
Publication venue
Publication date: 01/01/2023
Field of study

Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing ltqp approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. Under consideration in Theory and Practice of Logic Programming (TPLP).Comment: Under consideration in Theory and Practice of Logic Programming (TPLP

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Formal Approaches to Querying Big Data in Shared-Nothing Systems

Author: Ketsman Bas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/12/2019
Field of study

To meet today's data management needs, it is a widespread practice to use distributed data storage and processing systems. Since the publication of the MapReduce paradigm, a plethora of such systems arose, but although widespread, the capabilities of these systems are still poorly understood and putting them to effective use is often more of an art than a science.As one of the causes for this observation, we identify a lack of theoretical underpinnings for these systems, which makes it hard to understand what the advantages and disadvantages of the particular systems are and which, in addition, complicates the choice of a particular formalism for a particular task. In my PhD thesis, we zoom in on several important aspects of query evaluation using clusters of servers, including coordination and communication, data-skew, load balancing, and data partitioning, and propose a set of elegant and theoretically sound frameworks and theories that help to understand the applicable limitations and trade-offs

Infoscience - École polytechnique fédérale de Lausanne

Crossref